Search results for "Markov decision process"
showing 10 items of 22 documents
MDP-Based Resource Allocation Scheme Towards a Vehicular Fog Computing with Energy Constraints
2018
As mobile applications deliver increasingly complex functionalities, the demands for even more intensive computation would quickly transcend energy capability of mobile devices. On one hand and in an attempt to address such issues, fog computing paradigm is introduced to mitigate the limited energy and computation resources available within constrained mobile devices, by moving computation resources closer to their users at the edge of the access network. On another hand, most of electric vehicles (EVs), with increasing computation, storage and energy capabilities, spend more than 90% of time on parking lots. In this paper, we conceive the basic idea of using the underutilized computation r…
Safer Reinforcement Learning for Agents in Industrial Grid-Warehousing
2020
In mission-critical, real-world environments, there is typically a low threshold for failure, which makes interaction with learning algorithms particularly challenging. Here, current state-of-the-art reinforcement learning algorithms struggle to learn optimal control policies safely. Loss of control follows, which could result in equipment breakages and even personal injuries.
Increasing sample efficiency in deep reinforcement learning using generative environment modelling
2020
CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning
2020
Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment. The learning typically happens through trial and error using explorative methods, such as \(\epsilon \)-greedy. There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines. Model-based RL learns a model of the environment for learning the policy while model-free approaches are fully explorative and exploitative without considering the underlying environment dynamics. Model-free RL works conceptually well in simulated environments, and empirical evidence suggests that trial and error lead to a near-opti…
Some Effects of Individual Learning on the Evolution of Sensors
2001
In this paper, we present an abstract model of sensor evolution, where sensor development is only determined by artificial evolution and the adaptation of agent reactions is accomplished by individual learning. With the environment cast into a MDP framework, sensors can be conceived as a map from environmental states to agent observations and Reinforcement Learning algorithms can be utilised. On the basis of a simple gridworld scenario, we present some results of the interaction between individual learning and evolution of sensors.
Continuous energy-efficient monitoring model for mobile ad hoc networks
2021
The monitoring of mobile ad hoc networks is an observation task that consists of analysing the operational status of these networks while evaluating their functionalities. In order to allow the whole network and applications to work properly, the monitoring task has become of considerable importance. It must be carried out in real-time by performing measurements, logs, configurations, etc. However, achieving continuous energy-efficient monitoring in mobile wireless networks is very challenging considering the environment features as well as the unpredictable behavior of the participating nodes. This paper outlines the challenges of continuous energy-efficient monitoring over mobile ad hoc n…
MDP-based Resource Allocation for Uplink Grant-free Transmissions in 5G New Radio
2020
The diversity of application scenarios in 5G mobile communication networks calls for innovative initial access schemes beyond traditional grant-based approaches. As a novel concept for facilitating small packet transmission and achieving ultra-low latency, grant-free communication is attracting lots of interests in the research community and standardization bodies. However, when a network consists of both grant based and grant-free based end devices, how to allocate slot resources properly between these two categories of devices remains as an unanswered question. In this paper, we propose a Markov decision process based scheme which dynamically allocates grant-free resources based on a spec…
Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop
2018
Active inference is an ambitious theory that treats perception, inference and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g.\ different environments or agent morphologies. In the literature, paradigms that share this independence have been summarised under the notion of in…
Towards Model-Based Reinforcement Learning for Industry-Near Environments
2019
Deep reinforcement learning has over the past few years shown great potential in learning near-optimal control in complex simulated environments with little visible information. Rainbow (Q-Learning) and PPO (Policy Optimisation) have shown outstanding performance in a variety of tasks, including Atari 2600, MuJoCo, and Roboschool test suite. Although these algorithms are fundamentally different, both suffer from high variance, low sample efficiency, and hyperparameter sensitivity that, in practice, make these algorithms a no-go for critical operations in the industry.
Explainable Reinforcement Learning with the Tsetlin Machine
2021
The Tsetlin Machine is a recent supervised machine learning algorithm that has obtained competitive results in several benchmarks, both in terms of accuracy and resource usage. It has been used for convolution, classification, and regression, producing interpretable rules. In this paper, we introduce the first framework for reinforcement learning based on the Tsetlin Machine. We combined the value iteration algorithm with the regression Tsetlin Machine, as the value function approximator, to investigate the feasibility of training the Tsetlin Machine through bootstrapping. Moreover, we document robustness and accuracy of learning on several instances of the grid-world problem.